11 research outputs found
A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds
Cloud platforms have emerged as a prominent environment to execute high
performance computing (HPC) applications providing on-demand resources as well
as scalability. They usually offer different classes of Virtual Machines (VMs)
which ensure different guarantees in terms of availability and volatility,
provisioning the same resource through multiple pricing models. For instance,
in Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs
are unused instances available for lower price. Despite the monetary
advantages, a spot VM can be terminated, stopped, or hibernated by EC2 at any
moment.
Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we
propose in this paper a static scheduling for HPC applications which are
composed by independent tasks (bag-of-task) with deadline constraints. However,
if a spot VM hibernates and it does not resume within a time which guarantees
the application's deadline, a temporal failure takes place. Our scheduling,
thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2
cloud, respecting its deadline and avoiding temporal failures. To this end, our
algorithm statically creates two scheduling maps: (i) the first one contains,
for each task, its starting time and on which VM (i.e., an available spot or
on-demand VM with the current lowest price) the task should execute; (ii) the
second one contains, for each task allocated on a VM spot in the first map, its
starting time and on which on-demand VM it should be executed to meet the
application deadline in order to avoid temporal failures. The latter will be
used whenever the hibernation period of a spot VM exceeds a time limit.
Performance results from simulation with task execution traces, configuration
of Amazon EC2 VM classes, and VMs market history confirms the effectiveness of
our scheduling and that it tolerates temporal failures
Multi-FedLS: a Framework for Cross-Silo Federated Learning Applications on Multi-Cloud Environments
Federated Learning (FL) is a distributed Machine Learning (ML) technique that
can benefit from cloud environments while preserving data privacy. We propose
Multi-FedLS, a framework that manages multi-cloud resources, reducing execution
time and financial costs of Cross-Silo Federated Learning applications by using
preemptible VMs, cheaper than on-demand ones but that can be revoked at any
time. Our framework encloses four modules: Pre-Scheduling, Initial Mapping,
Fault Tolerance, and Dynamic Scheduler. This paper extends our previous work
\cite{brum2022sbac} by formally describing the Multi-FedLS resource manager
framework and its modules. Experiments were conducted with three Cross-Silo FL
applications on CloudLab and a proof-of-concept confirms that Multi-FedLS can
be executed on a multi-cloud composed by AWS and GCP, two commercial cloud
providers. Results show that the problem of executing Cross-Silo FL
applications in multi-cloud environments with preemptible VMs can be
efficiently resolved using a mathematical formulation, fault tolerance
techniques, and a simple heuristic to choose a new VM in case of revocation.Comment: In review by Journal of Parallel and Distributed Computin
A Hibernation Aware Dynamic Scheduler for Cloud Environments
International audienceNowadays, cloud platforms usually offer several types of Virtual Machines (VMs) which have different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in the Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for a lower price. Despite the monetary advantages, a spot VM can be terminated or hibernated by EC2 at any moment. In this work, we propose the Hibernation-Aware Dynamic Scheduler (HADS), to schedule applications composed of independent tasks (bag-of-tasks) with deadline constraints in both hibernation-prone spot VMs (for cost sake) and on-demand VMs. We also consider the problem of temporal failures, that occurs when a spot VM hibernates, and does not resume within a time that guarantees the application's deadline. Our dynamic scheduling approach aims at minimizing the monetary costs of bag-of-tasks applications execution, respecting its deadline even in the presence of hibernation. It is also able to avoid temporal failures, by using task migration and work-stealing techniques. Experimental results with real executions using Amazon EC2 VMs confirm the effectiveness of our scheduling when compared with on-demand VM only based approaches, in terms of monetary costs and execution times. It is also shown that our strategy can tolerate temporal failures